Airbnb Seattle AnalysisΒΆ

The dataset used was the airbnb Seattle dataset.ΒΆ

Data Analysis has been done from the perspective of a prospective host who wishes to rent an airbnb to earn revenue. The host wishes to understand all which is necessary or contributes in becoming a successful airbnb host which means having a high rating as well as a high revenue turnover.ΒΆ

The three fundermental questions that we hope to anwser through this dataset include:

  1. Which amenities are the most important and play a role in obtaining a high rating therefore increasing revenue. Is there any correlation between one amenity and the other, if so how strong?

  2. What are the general characterstics of AirBnBs and their hosts with a higher rating.

  3. How do the prices and the number of bookings fluctuate with regards to time. What are the trends observed within the airBnBs of the three different categories.

InΒ [Β ]:
import numpy as np
import pandas as pd
import seaborn as sns
import datetime as dt
import matplotlib.pyplot as plt
InΒ [Β ]:
df = pd.read_csv('listings.csv')
InΒ [Β ]:
df.head()
df.shape
df.info()
df.describe()
df.columns
df.dtypes
df.hist(figsize=(15,15))
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 92 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   id                                3818 non-null   int64  
 1   listing_url                       3818 non-null   object 
 2   scrape_id                         3818 non-null   int64  
 3   last_scraped                      3818 non-null   object 
 4   name                              3818 non-null   object 
 5   summary                           3641 non-null   object 
 6   space                             3249 non-null   object 
 7   description                       3818 non-null   object 
 8   experiences_offered               3818 non-null   object 
 9   neighborhood_overview             2786 non-null   object 
 10  notes                             2212 non-null   object 
 11  transit                           2884 non-null   object 
 12  thumbnail_url                     3498 non-null   object 
 13  medium_url                        3498 non-null   object 
 14  picture_url                       3818 non-null   object 
 15  xl_picture_url                    3498 non-null   object 
 16  host_id                           3818 non-null   int64  
 17  host_url                          3818 non-null   object 
 18  host_name                         3816 non-null   object 
 19  host_since                        3816 non-null   object 
 20  host_location                     3810 non-null   object 
 21  host_about                        2959 non-null   object 
 22  host_response_time                3295 non-null   object 
 23  host_response_rate                3295 non-null   object 
 24  host_acceptance_rate              3045 non-null   object 
 25  host_is_superhost                 3816 non-null   object 
 26  host_thumbnail_url                3816 non-null   object 
 27  host_picture_url                  3816 non-null   object 
 28  host_neighbourhood                3518 non-null   object 
 29  host_listings_count               3816 non-null   float64
 30  host_total_listings_count         3816 non-null   float64
 31  host_verifications                3816 non-null   object 
 32  host_has_profile_pic              3816 non-null   object 
 33  host_identity_verified            3816 non-null   object 
 34  street                            3818 non-null   object 
 35  neighbourhood                     3402 non-null   object 
 36  neighbourhood_cleansed            3818 non-null   object 
 37  neighbourhood_group_cleansed      3818 non-null   object 
 38  city                              3818 non-null   object 
 39  state                             3818 non-null   object 
 40  zipcode                           3811 non-null   object 
 41  market                            3818 non-null   object 
 42  smart_location                    3818 non-null   object 
 43  country_code                      3818 non-null   object 
 44  country                           3818 non-null   object 
 45  latitude                          3818 non-null   float64
 46  longitude                         3818 non-null   float64
 47  is_location_exact                 3818 non-null   object 
 48  property_type                     3817 non-null   object 
 49  room_type                         3818 non-null   object 
 50  accommodates                      3818 non-null   int64  
 51  bathrooms                         3802 non-null   float64
 52  bedrooms                          3812 non-null   float64
 53  beds                              3817 non-null   float64
 54  bed_type                          3818 non-null   object 
 55  amenities                         3818 non-null   object 
 56  square_feet                       97 non-null     float64
 57  price                             3818 non-null   object 
 58  weekly_price                      2009 non-null   object 
 59  monthly_price                     1517 non-null   object 
 60  security_deposit                  1866 non-null   object 
 61  cleaning_fee                      2788 non-null   object 
 62  guests_included                   3818 non-null   int64  
 63  extra_people                      3818 non-null   object 
 64  minimum_nights                    3818 non-null   int64  
 65  maximum_nights                    3818 non-null   int64  
 66  calendar_updated                  3818 non-null   object 
 67  has_availability                  3818 non-null   object 
 68  availability_30                   3818 non-null   int64  
 69  availability_60                   3818 non-null   int64  
 70  availability_90                   3818 non-null   int64  
 71  availability_365                  3818 non-null   int64  
 72  calendar_last_scraped             3818 non-null   object 
 73  number_of_reviews                 3818 non-null   int64  
 74  first_review                      3191 non-null   object 
 75  last_review                       3191 non-null   object 
 76  review_scores_rating              3171 non-null   float64
 77  review_scores_accuracy            3160 non-null   float64
 78  review_scores_cleanliness         3165 non-null   float64
 79  review_scores_checkin             3160 non-null   float64
 80  review_scores_communication       3167 non-null   float64
 81  review_scores_location            3163 non-null   float64
 82  review_scores_value               3162 non-null   float64
 83  requires_license                  3818 non-null   object 
 84  license                           0 non-null      float64
 85  jurisdiction_names                3818 non-null   object 
 86  instant_bookable                  3818 non-null   object 
 87  cancellation_policy               3818 non-null   object 
 88  require_guest_profile_picture     3818 non-null   object 
 89  require_guest_phone_verification  3818 non-null   object 
 90  calculated_host_listings_count    3818 non-null   int64  
 91  reviews_per_month                 3191 non-null   float64
dtypes: float64(17), int64(13), object(62)
memory usage: 2.7+ MB
Out[Β ]:
array([[<Axes: title={'center': 'id'}>,
        <Axes: title={'center': 'scrape_id'}>,
        <Axes: title={'center': 'host_id'}>,
        <Axes: title={'center': 'host_listings_count'}>,
        <Axes: title={'center': 'host_total_listings_count'}>],
       [<Axes: title={'center': 'latitude'}>,
        <Axes: title={'center': 'longitude'}>,
        <Axes: title={'center': 'accommodates'}>,
        <Axes: title={'center': 'bathrooms'}>,
        <Axes: title={'center': 'bedrooms'}>],
       [<Axes: title={'center': 'beds'}>,
        <Axes: title={'center': 'square_feet'}>,
        <Axes: title={'center': 'guests_included'}>,
        <Axes: title={'center': 'minimum_nights'}>,
        <Axes: title={'center': 'maximum_nights'}>],
       [<Axes: title={'center': 'availability_30'}>,
        <Axes: title={'center': 'availability_60'}>,
        <Axes: title={'center': 'availability_90'}>,
        <Axes: title={'center': 'availability_365'}>,
        <Axes: title={'center': 'number_of_reviews'}>],
       [<Axes: title={'center': 'review_scores_rating'}>,
        <Axes: title={'center': 'review_scores_accuracy'}>,
        <Axes: title={'center': 'review_scores_cleanliness'}>,
        <Axes: title={'center': 'review_scores_checkin'}>,
        <Axes: title={'center': 'review_scores_communication'}>],
       [<Axes: title={'center': 'review_scores_location'}>,
        <Axes: title={'center': 'review_scores_value'}>,
        <Axes: title={'center': 'license'}>,
        <Axes: title={'center': 'calculated_host_listings_count'}>,
        <Axes: title={'center': 'reviews_per_month'}>]], dtype=object)
No description has been provided for this image

data preparationΒΆ


columns to be removed:ΒΆ

Redundant columns with no use in analysis:

  1. host_url, medium_url, host_thumbnail_url, thumbnail_url picture_url , host_picture_url , listing_url , xl_picture_url: These URLS had no use in data analyis
  2. space , summary , notes , neighborhood_overview , description , experiences_offered: These features consisted on extensive descrptive paragraphs to describe the listing and its features which had no use in the data analysis process.
  3. street , transit , latitude , zipcode , state , scrape_id , country_code , longitude , country , host_about: All these details were covered in the address and city coloum. A micro analysis of the streets and specific co ordinates would not be done. The state and country details aren t needed since this dataset is limited to Seattle, Washington, USA.
InΒ [Β ]:
df = df[['id', 'last_scraped', 'name', 'host_id', 'host_name', 'host_since', 'host_location', 'host_response_time',
         'host_response_rate', 'host_acceptance_rate', 'host_is_superhost', 'host_neighbourhood', 'host_listings_count',
         'host_total_listings_count', 'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
         'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'city', 'market', 'smart_location', 'is_location_exact',
         'property_type', 'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'bed_type', 'amenities', 'square_feet',
         'price', 'weekly_price', 'monthly_price', 'security_deposit', 'cleaning_fee', 'guests_included', 'extra_people',
         'minimum_nights', 'maximum_nights', 'calendar_updated', 'has_availability', 'availability_30', 'availability_60',
         'availability_90', 'availability_365', 'calendar_last_scraped', 'number_of_reviews', 'first_review', 'last_review',
         'review_scores_rating', 'review_scores_accuracy','review_scores_cleanliness', 'review_scores_checkin',
         'review_scores_communication', 'review_scores_location', 'review_scores_value', 'requires_license', 'license',
         'jurisdiction_names', 'instant_bookable', 'cancellation_policy', 'require_guest_profile_picture',
         'require_guest_phone_verification', 'calculated_host_listings_count', 'reviews_per_month']]
InΒ [Β ]:
#examining rows and clean rows with nan vals

print("columns with missing values > 80%: ", df.columns[df.isnull().sum() > 0.80 * df.shape[0]].tolist())
print("columns with missing values > 50%: ", df.columns[df.isnull().sum() > 0.50 * df.shape[0]].tolist())
columns with missing values > 80%:  ['square_feet', 'license']
columns with missing values > 50%:  ['square_feet', 'monthly_price', 'security_deposit', 'license']
InΒ [Β ]:
#dropping all columns > 50

x=df.nunique() == 1
similar_cols_vals = (x[x == True].keys().tolist())
print("columns with similar vals: ", similar_cols_vals)
columns with similar vals:  ['last_scraped', 'market', 'has_availability', 'calendar_last_scraped', 'requires_license', 'jurisdiction_names']
InΒ [Β ]:
#drop

df = df.drop(columns=similar_cols_vals)

# since neighbourhood_cleansed already exists containing cleaned data
df = df.drop(columns=['neighbourhood'])

Split amenities feature column into multiple columns, each with a truth value of weather that specific feature is present in the datapoint or not.


InΒ [Β ]:
import re

#using regex to clean the amenities string column
df['amenities'] = df['amenities'].str.replace(r'[{}"]', '', regex=True)
InΒ [Β ]:
#extracting all unique amenities by splitting
amenities_values = df.amenities
newarray = list()
for row in amenities_values:
    row = row.split(',')
    for i in row:
        newarray.append(i)
amenities_values = set(newarray)
InΒ [Β ]:
#merging obtained amenities dataframe with the main dataframe(df)
df_amen = pd.DataFrame(df[['id', 'amenities']])
InΒ [Β ]:
df_amen = df_amen.reindex(columns=[*df_amen.columns.tolist(), *amenities_values], fill_value=0)

one hot encoding for amenities

InΒ [Β ]:
for index, row in df_amen.iterrows():
    amenities = []
    amenities.clear()
    amenities = row['amenities'].rsplit(',')
    for vals in amenities:
        df_amen.at[index, vals] = 1
print('Complete!')
Complete!
InΒ [Β ]:
#drop amenities feature column and empty column
df_amen.drop(columns=['amenities', ''], inplace=True)
InΒ [Β ]:
df_amen.head(2)
Out[Β ]:
id Breakfast Lock on Bedroom Door Hangers Indoor Fireplace Free Parking on Premises 24-Hour Check-in First Aid Kit Internet Hot Tub ... Fire Extinguisher Essentials Iron Washer Dryer Pool Smoking Allowed Air Conditioning Hair Dryer Buzzer/Wireless Intercom
0 241032 0 0 0 0 0 0 0 1 0 ... 0 0 0 1 1 0 0 1 0 0
1 953595 0 0 0 0 1 0 1 1 0 ... 1 1 0 1 1 0 0 0 0 1

2 rows Γ— 42 columns

InΒ [Β ]:
df = pd.concat([df, df_amen], axis = 1, sort=True)
InΒ [Β ]:
df.drop(columns='amenities', inplace = True)
InΒ [Β ]:
#renaming columns: adding subscript_ on amenties for easy identification

for cols in df:
    if cols in amenities_values:
        new_val = 'amenities_' + cols
        df.rename(columns={cols:new_val}, inplace = True)
InΒ [Β ]:
#removing duplicate columns
df = df.loc[:,~df.columns.duplicated()]
InΒ [Β ]:
df.shape
Out[Β ]:
(3818, 101)

split 'host verification' heature column into multiple columns, each with a truth value of whether that specific feature is present in that datapoint


InΒ [Β ]:
#exploring dataset
df['host_verifications'][:4]
Out[Β ]:
host_verifications
0 ['email', 'phone', 'reviews', 'kba']
1 ['email', 'phone', 'facebook', 'linkedin', 're...
2 ['email', 'phone', 'google', 'reviews', 'jumio']
3 ['email', 'phone', 'facebook', 'reviews', 'jum...

InΒ [Β ]:
#using regex to clean this string column
df['host_verifications'] = df['host_verifications'].str.replace(r"[\[\]' ]", '', regex = True)
InΒ [Β ]:
df['host_verifications'][:4]
Out[Β ]:
host_verifications
0 email,phone,reviews,kba
1 email,phone,facebook,linkedin,reviews,jumio
2 email,phone,google,reviews,jumio
3 email,phone,facebook,reviews,jumio

InΒ [Β ]:
#split feature column
host_verifications_values = df.host_verifications
newarray = list()
for row in host_verifications_values:
    if isinstance(row, str):
        row = row.split(',')
        for i in row:
            newarray.append(i.strip())

host_verifications_values = set(newarray)
InΒ [Β ]:
df_host_verifications = pd.DataFrame(df[['host_id', 'host_verifications']])
df_host_verifications = df_host_verifications.reindex(columns=[*df_host_verifications.columns.tolist(), *host_verifications_values], fill_value=0)

#one hot encoding
host_verifications = []
for index, row in df_host_verifications.iterrows():
    host_verifications.clear()
    if isinstance(row, str):
        host_verifications = row['host_verifiacations'].rsplit(',')
        for vals in host_verifications:
            df_host_verifications.at[index, vals] = 1

print("Finished!")
Finished!

Merging the obtained host verification data frame with main df

InΒ [Β ]:
df = pd.concat([df, df_host_verifications], axis=1)
InΒ [Β ]:
df.drop(columns='host_verifications', inplace=True)
InΒ [Β ]:
#rename columns and add suscript

for cols in df:
    if cols in host_verifications_values:
        new_val = 'host_verifications_' + cols
        df.rename(columns={cols:new_val}, inplace=True)
InΒ [Β ]:
df.shape
Out[Β ]:
(3818, 116)
InΒ [Β ]:
df = df.loc[:,~df.columns.duplicated()]

Analysize DataΒΆ


To reiterate, the three fundermental questions that we hope to answer through this dataset include:ΒΆ

  1. Which amenities are the most important and play a role in obtaining a high rating therefore increasing revenue. Is there any correlation between one amenity and the other, if so how strong?

  2. What are the general characterstics of AirBnBs and their hosts with a higher rating.

  3. How do the prices and the number of bookings fluctuate with regards to time. What are the trends observed within the airBnBs of the three different categories.

InΒ [Β ]:
df_analysis = df.copy(deep=True)
InΒ [Β ]:
df_analysis2 = df_analysis.copy(deep=True)
# df_analysis2 = (df_analysis2[pd.notnull(df_analysis2['host_is_superhost'])]).copy(deep=True)
# df_analysis2.dropna(inplace=True)
df_analysis2['host_is_superhost'] = df_analysis2['host_is_superhost'].map({'t': 1.0, 'f': 0.0})
  1. Which amenities are the most important and play a role in obtaining a high rating therefore increasing revenue. Is there any correlation between one amenity and the other, if so how strong?
InΒ [Β ]:
is_superhost_one = df_analysis2.select_dtypes([np.number]) #selecting only numeric vars
is_superhost_one = is_superhost_one.drop(columns=['id', 'host_id'])

#m = ~(is_superhost_one.mask(np.eye(len(is_superhost_one), dtype=bool)).abs() > 0.4).any()
InΒ [Β ]:
is_superhost_one = is_superhost_one.corr()
sns.set(rc={'figure.figsize': (19, 19)})
sns.heatmap(is_superhost_one,
            xticklabels=is_superhost_one.columns.values,
            yticklabels=is_superhost_one.columns.values, cmap='coolwarm')
Out[Β ]:
<Axes: >
No description has been provided for this image
InΒ [Β ]:
def get_redundant_pairs(df):
    '''get diagonal and lower triangular pairs of correlation matrix'''
    pairs_to_drop = set()
    cols = df.columns
    for i in range(0, df.shape[1]):
        for j in range(0, i+1):
            pairs_to_drop.add((cols[i], cols[j]))
    return pairs_to_drop

def get_top_abs_correlations(df, n=5):
    au_corr = df.corr().abs().unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=False)
    return au_corr[0:n]

print("top absolute correlations")
a = get_top_abs_correlations(is_superhost_one, 20)
print(a)
top absolute correlations
host_listings_count                  host_total_listings_count              1.000000
amenities_Washer                     amenities_Dryer                        0.998781
availability_60                      availability_90                        0.998334
availability_30                      availability_60                        0.996001
                                     availability_90                        0.990096
accommodates                         beds                                   0.986765
amenities_Hangers                    amenities_Hair Dryer                   0.986213
amenities_Iron                       amenities_Hair Dryer                   0.985907
amenities_Hangers                    amenities_Iron                         0.981137
                                     amenities_Laptop Friendly Workspace    0.971552
amenities_Laptop Friendly Workspace  amenities_Hair Dryer                   0.962690
bedrooms                             beds                                   0.962489
accommodates                         bedrooms                               0.959537
amenities_Laptop Friendly Workspace  amenities_Iron                         0.958119
amenities_Hangers                    amenities_24-Hour Check-in             0.955385
review_scores_rating                 review_scores_value                    0.953924
review_scores_checkin                review_scores_communication            0.953799
amenities_24-Hour Check-in           amenities_Hair Dryer                   0.952443
                                     amenities_Laptop Friendly Workspace    0.945604
                                     amenities_Iron                         0.940274
dtype: float64

Analysis of Correlated VariablesΒΆ

The correlations show and hint at the following characterstics:

A high correlation of the following amenities:

  • Washer and Dryer
  • Hair Dryer, Iron, Hangers

means that they are to be kept together if one of them is present. This is quite self explainatory in the sense that if an Iron is present, then hangers are to be present and if there are facilities avaliable for washing garments such as a washer, then a dryer is to be present as well. These all amenities also correlate with the '24 hour check in amenities', which means an Airbnb which offers a 24 hour check is to ideally have these amenities to get a better rating and review.

A high correlation of a laptop friendly workspace with Iron, Hanger and Hairdryer along with a 24 hour check means that a laptop friendly workspace includes other amentities like Iron, Hair dryer and Hangers.

It also shows that a rating score for communication is highly correlated with checkin score which means that the checkin process, handling and the interaction that occurs during that time is a crucial determinant in determining communication rating score.


  1. What are the general characterstics of AirBnBs and their hosts with a higher rating.

We shall iteratively go through the features present in the dataset and do a comparision to find the most popular characterstic within that feature.

We'll determine this by either checking which characterstic had a higher count with a wide margin from other characterstics present or we'll group/split the datapoints according to their review score(high score, average score, low score) and do a comparision to find which characterstics stood out in datapoints with a higher rating against the other data points.

investigating the characteristics of airbnbs w the highest review score rating being 100

InΒ [Β ]:
df_analysis3 = df_analysis.copy(deep = True)
InΒ [Β ]:
#df_analysis3['review_scores_rating'].value_counts()
top_review_scores = df_analysis3[df_analysis3['review_scores_rating'] == 100]
print('the number of airbnbs w the highest score were:', top_review_scores.shape[0])
the number of airbnbs w the highest score were: 781
InΒ [Β ]:
print('top locations with the highest score were: ')
print()
print(((top_review_scores['neighbourhood_group_cleansed'].value_counts() * 100 / top_review_scores['review_scores_rating'].shape[0]).sort_values(ascending=False)));
top_loc_percentages = (top_review_scores['neighbourhood_group_cleansed'].value_counts() * 100 /top_review_scores['review_scores_rating'].shape[0]).sort_values(ascending=True)

colors = sns.color_palette('husl', len(top_loc_percentages))
top_loc_percentages.plot(kind='barh', figsize=(18, 8), color=colors)

plt.xlabel('percentage')
plt.ylabel('neighborhood group')
plt.tight_layout()
plt.show()
top locations with the highest score were: 

neighbourhood_group_cleansed
Other neighborhoods    20.102433
Capitol Hill           15.492958
Central Area           13.700384
Downtown               11.395647
Queen Anne              9.218950
West Seattle            5.761844
Ballard                 4.609475
Rainier Valley          3.457106
Cascade                 2.432778
University District     2.304738
Northgate               2.176697
Beacon Hill             2.176697
Delridge                2.048656
Magnolia                1.920615
Lake City               1.920615
Seward Park             1.024328
Interbay                0.256082
Name: count, dtype: float64
No description has been provided for this image

most prefered room type:ΒΆ

InΒ [Β ]:
print((top_review_scores.groupby('room_type')['review_scores_rating'].count().sort_values(ascending=False)));
room_type_counts = (top_review_scores.groupby('room_type')['review_scores_rating'].count().sort_values(ascending=False))
colors = sns.color_palette('Set2', n_colors=len(room_type_counts))
room_type_counts.plot.bar(figsize=(5, 4), color=colors)

plt.ylabel('review count')
plt.xlabel('room type')
plt.tight_layout()
plt.show()
room_type
Entire home/apt    542
Private room       214
Shared room         25
Name: review_scores_rating, dtype: int64
No description has been provided for this image
InΒ [Β ]:
print('the most popular room type was: ', (top_review_scores.groupby('room_type')['review_scores_rating'].count().sort_values(ascending=False)).keys().tolist()[0])
the most popular room type was:  Entire home/apt

investigating the primary feature columnsΒΆ

InΒ [Β ]:
allcols = df_analysis.columns
allcols = allcols.drop(['id',
                         'host_location',
    'host_neighbourhood',  'host_response_rate',
    'price',
    'weekly_price',
    'cleaning_fee',
    'extra_people',
    'maximum_nights',
    'availability_30',
    'availability_60',
    'availability_90',
    'availability_365',
 'name',
 'host_id',
 'host_name',
 'host_since', 'amenities_Heating',
 'amenities_Essentials',
 'amenities_Wheelchair Accessible',
 'amenities_Elevator in Building',
 'amenities_Internet',
 'amenities_24-Hour Check-in',
 'amenities_Shampoo',
 'amenities_Carbon Monoxide Detector',
 'amenities_Family/Kid Friendly',
 'amenities_Breakfast',
 'amenities_Lock on Bedroom Door',
 'amenities_Safety Card',
 'amenities_Pets Allowed',
 'amenities_Other pet(s)',
 'amenities_Kitchen',
 'amenities_Air Conditioning',
 'amenities_Smoking Allowed',
 'amenities_Dryer',
 'amenities_Buzzer/Wireless Intercom',
 'amenities_Cable TV',
 'amenities_Hangers',
 'amenities_Indoor Fireplace',
 'amenities_Hair Dryer',
 'amenities_Pets live on this property',
 'amenities_Wireless Internet',
 'amenities_Washer',
 'amenities_Free Parking on Premises',
 'amenities_Gym',
 'amenities_Pool',
 'amenities_TV',
 'amenities_Doorman',
 'amenities_Smoke Detector',
 'amenities_Laptop Friendly Workspace',
 'amenities_Suitable for Events',
 'amenities_Iron',
 'amenities_First Aid Kit',
 'amenities_Dog(s)',
 'amenities_Fire Extinguisher',
 'amenities_Cat(s)',
 'amenities_Hot Tub',
 'amenities_Washer / Dryer'])
allcols = allcols.tolist()
InΒ [Β ]:
df_analysis.shape
Out[Β ]:
(3818, 115)

The dataset has been split into three categories:ΒΆ

  1. high_review_scores: Consisted of data points which had a rating of 90 and above.
  2. mid_review_scores: Consisted of data points which had a rating between 70 and 90.
  3. low_review_scores: Consisted of data points which had a rating below 70.

The below cells gives a comparitive analysis of each feature and how its charactersitcs vary between the three categories. Analysis of these features would follow right after the graphs.ΒΆ

general characteristicsΒΆ

InΒ [Β ]:
high_review_scores = df_analysis3[df_analysis3['review_scores_rating'] >= 90]
mid_review_scores = df_analysis3[(df_analysis3['review_scores_rating'] >= 70) & (df_analysis3['review_scores_rating'] < 90)]
low_review_scores = df_analysis3[df_analysis3['review_scores_rating'] < 70]

for cols in allcols:
    fig, axs = plt.subplots(1, 3, figsize=(15, 5))
    fig.suptitle(cols)

    #high
    data_high = ((high_review_scores.groupby(cols)['review_scores_rating'].count()) / df_analysis3.shape[0]) * 100
    if not data_high.empty:
        colors_high = sns.color_palette("crest", len(data_high))
        data_high.sort_values(ascending=False).plot.bar(ax=axs[0], title='high', color=colors_high)

    # mid
    data_mid = ((mid_review_scores.groupby(cols)['review_scores_rating'].count()) / df_analysis3.shape[0]) * 100
    if not data_mid.empty:
        colors_mid = sns.color_palette("flare", len(data_mid))
        data_mid.sort_values(ascending=False).plot.bar(ax=axs[1], title='mid', color=colors_mid)

    # low
    data_low = ((low_review_scores.groupby(cols)['review_scores_rating'].count()) / df_analysis3.shape[0]) * 100
    if not data_low.empty:
        colors_low = sns.color_palette("rocket", len(data_low))
        data_low.sort_values(ascending=False).plot.bar(ax=axs[2], title='low', color=colors_low)

    plt.tight_layout()
    plt.show()
    print('---------------------------------------------------------------------------------------------------------------------')
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
/tmp/ipython-input-126-135017051.py:27: UserWarning: Glyph 35199 (\N{CJK UNIFIED IDEOGRAPH-897F}) missing from font(s) DejaVu Sans.
  plt.tight_layout()
/tmp/ipython-input-126-135017051.py:27: UserWarning: Glyph 38597 (\N{CJK UNIFIED IDEOGRAPH-96C5}) missing from font(s) DejaVu Sans.
  plt.tight_layout()
/tmp/ipython-input-126-135017051.py:27: UserWarning: Glyph 22270 (\N{CJK UNIFIED IDEOGRAPH-56FE}) missing from font(s) DejaVu Sans.
  plt.tight_layout()
/usr/local/lib/python3.11/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 35199 (\N{CJK UNIFIED IDEOGRAPH-897F}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)
/usr/local/lib/python3.11/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38597 (\N{CJK UNIFIED IDEOGRAPH-96C5}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)
/usr/local/lib/python3.11/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 22270 (\N{CJK UNIFIED IDEOGRAPH-56FE}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
/tmp/ipython-input-126-135017051.py:27: UserWarning: Glyph 35199 (\N{CJK UNIFIED IDEOGRAPH-897F}) missing from font(s) DejaVu Sans.
  plt.tight_layout()
/tmp/ipython-input-126-135017051.py:27: UserWarning: Glyph 38597 (\N{CJK UNIFIED IDEOGRAPH-96C5}) missing from font(s) DejaVu Sans.
  plt.tight_layout()
/tmp/ipython-input-126-135017051.py:27: UserWarning: Glyph 22270 (\N{CJK UNIFIED IDEOGRAPH-56FE}) missing from font(s) DejaVu Sans.
  plt.tight_layout()
/usr/local/lib/python3.11/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 35199 (\N{CJK UNIFIED IDEOGRAPH-897F}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)
/usr/local/lib/python3.11/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38597 (\N{CJK UNIFIED IDEOGRAPH-96C5}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)
/usr/local/lib/python3.11/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 22270 (\N{CJK UNIFIED IDEOGRAPH-56FE}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------
InΒ [Β ]:
## creating a separate dataframe for features with messy graphs.
analylis2 = df_analysis.copy(deep=True)
analylis2a = analylis2[[   'host_location',
    'host_neighbourhood',
    'host_response_rate',
    'price',
    'weekly_price',
    'cleaning_fee',
    'extra_people',
    'maximum_nights',
    'availability_30',
    'availability_60',
    'availability_90',
    'availability_365'
]]

analylis2a_cols = analylis2a.columns

for cols in analylis2a_cols:
    fig, axs = plt.subplots(1,3, figsize=(15,5))
    fig.suptitle(cols)

    # high ratings
    high_data = (((high_review_scores.groupby(cols)['review_scores_rating'].count()) / high_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    high_colors = sns.color_palette("crest", len(high_data))
    high_data.plot.bar(ax=axs[0], color=high_colors)
    axs[0].set_title('high ratings (β‰₯90)')

    # mid ratings
    mid_data = (((mid_review_scores.groupby(cols)['review_scores_rating'].count()) / mid_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    mid_colors = sns.color_palette("flare", len(mid_data))
    mid_data.plot.bar(ax=axs[1], color=mid_colors)
    axs[1].set_title('mid ratings (70-89)')

    # low ratings
    low_data = (((low_review_scores.groupby(cols)['review_scores_rating'].count()) / low_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    low_colors = sns.color_palette("rocket", len(low_data))
    low_data.plot.bar(ax=axs[2], color=low_colors)
    axs[2].set_title('low ratings (<70)')

    plt.show()
    print('---------------------------------------------------------------------------------------------------------------------')
    print()
/tmp/ipython-input-110-913758689.py:41: UserWarning: Tight layout not applied. The bottom and top margins cannot be made large enough to accommodate all Axes decorations.
  plt.tight_layout()
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

Evaluation and analysis of the graphs of the important characterstics and trends found in the datasetΒΆ

We divided the data into three categories based on their review score categories:

high_review_scores: Consisted of data points which had a rating of 90 and above.
mid_review_scores: Consisted of data points which had a rating between 70 and 90.
low_review_scores: Consisted of data points which had a rating below 70.

The intention is to determine how the trends within the features vary between each categories. It would allow us to understand which features play a determining role in emabling the airBnB to get a higher rating.

host_response_time:
While the trends largely remain same within all the three datasets, we find that the majority of data points with high_review_scores not only responded within an hour but also the incidence of data points which responded 'within a few hours' and 'within a day' is far lesser when compared to datapoints having a average and lower rating score.

host_is_superhost:
The only category that consisted of a sizable number pf superhosts(~25%) were that of high_review_scores. The remaining datasets and a neglible number of superhosts. This shows that if a host falls within the superhost category, it has a chance of getting a high review score in compariesion to other hosts.

We can built on this observation and determine what are the characterstics that differenciate between superhosts and hosts.

host_identity_verified:
The dataset of high_review_scores had a far greater number of hosts with verified identies and a much lower incidence of hosts with unverified ids in comparision to other datasets which had a far greater percentage of unverified hosts.

This allows us to conclude that a verified identification contributes to a higher score rating.

neighbourhood_group_cleansed:
AirBnBs with a higher score rating were found within the 'Downtown neighbourhood' followed by 'Capital Hill'. We find that average_review_scores consisted of both these neighbourhoods in their top two categories.

This shows that the ideal neighbourhood for the AirBnB to be located in is Downtown followed by Capital Hill.

is_location_exact:
A lower percentage(< 10%) of high_review_scores had inexact locations in comparision to other categories.

Hosts should therefore focus on ensuring that their location mentioned is exact to increase the likelihood of getting a better rating.

property_type:
AirBnBs with the highest score ratings were largely 'Houses' followed by 'Apartments', while the remaings AirBnBs with a lower scores were laregly Apartments followed by Houses.

A property type admittidly does play a role in score rating however its role isn't of much signifance as long as the property type is either a 'House or Apartment. Having other types of properties such as a Townhouse, Loft, Dorm has a far greater chance of contributing to a lower rating score.

room_type:
The trends between all the three datasets are largely the same however they do show that an Entire home/apt is favoured greatly by renters in comparision to other types of rooms.

Host should therefore consider renting out entire homes and apartments rather than opting to rent out a private room or worse - a shared room.

accommodates:
Rooms should ideally accomdate 2 - 4 people. This would enable the airBnB to have a higher score rating.

Other considerations:

  • bedtype should be a real bed.
  • guests_included: Average of 1 -2 guests can be included.
  • review_scores_cleanliness: Clean BnBs play a contributing role in higher scores.
  • review_scores_communication: Good communication is impertive to obtain a higher rating.
  • review_scores_location: Location as mentioed earlier plays a significant role in a higher rating.
  • minimum_nights: The AirBnBs with a higher score rating largely offered a minimum number of 2 nights(~42%) followed by a single night. Hosts should consider keeping this feature in mind when they are booking AirBnBs.

amenities characteristicsΒΆ

InΒ [Β ]:
amenities_column = ['amenities_Essentials', 'amenities_Wheelchair Accessible', 'amenities_Elevator in Building',
 'amenities_Internet',
 'amenities_24-Hour Check-in',
 'amenities_Shampoo',
 'amenities_Carbon Monoxide Detector',
 'amenities_Family/Kid Friendly',
 'amenities_Breakfast',
 'amenities_Lock on Bedroom Door',
 'amenities_Safety Card',
 'amenities_Pets Allowed',
 'amenities_Other pet(s)',
 'amenities_Kitchen',
 'amenities_Air Conditioning',
 'amenities_Smoking Allowed',
 'amenities_Dryer',
 'amenities_Buzzer/Wireless Intercom',
 'amenities_Cable TV',
 'amenities_Hangers',
 'amenities_Indoor Fireplace',
 'amenities_Hair Dryer',
 'amenities_Pets live on this property',
 'amenities_Wireless Internet',
 'amenities_Washer',
 'amenities_Free Parking on Premises',
 'amenities_Gym',
 'amenities_Pool',
 'amenities_TV',
 'amenities_Doorman',
 'amenities_Smoke Detector',
 'amenities_Laptop Friendly Workspace',
 'amenities_Suitable for Events',
 'amenities_Iron',
 'amenities_First Aid Kit',
 'amenities_Dog(s)',
 'amenities_Fire Extinguisher',
 'amenities_Cat(s)',
 'amenities_Hot Tub',
 'amenities_Washer / Dryer']

for cols in amenities_column:
    fig, axs = plt.subplots(1, 3, figsize=(15, 5))
    fig.suptitle(cols.replace('amenities_', ''))

    # high ratings
    high_data = (((high_review_scores.groupby(cols)['review_scores_rating'].count()) / high_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    high_colors = sns.color_palette("crest", len(high_data))
    high_data.plot.bar(ax=axs[0], color=high_colors)
    axs[0].set_title('high ratings (β‰₯90)')
    axs[0].set_ylabel('Percentage (%)')

    # mid ratings
    mid_data = (((mid_review_scores.groupby(cols)['review_scores_rating'].count()) / mid_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    mid_colors = sns.color_palette("flare", len(mid_data))
    mid_data.plot.bar(ax=axs[1], color=mid_colors)
    axs[1].set_title('mid ratings (70-89)')
    axs[1].set_ylabel('')

    # low ratings
    low_data = (((low_review_scores.groupby(cols)['review_scores_rating'].count()) / low_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    low_colors = sns.color_palette("rocket", len(low_data))
    low_data.plot.bar(ax=axs[2], color=low_colors)
    axs[2].set_title('low ratings (<70)')
    axs[2].set_ylabel('')

    plt.show()
    print('---------------------------------------------------------------------------------------------------------------------')
    print()
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

Evaluation of which amenities are the most important and play a role in obtaining a high rating therefore increasing revenue. Correlations also discussed:ΒΆ

Play a higher role in a higher rating:

  • First Aid kit
  • Shampoo
  • Wireless Internet
  • TV (incidences of AirBnBs without TV were far lower than those with a lower rating)
  • Internet (incidences of AirBnBs without Internet were far lower than those with a lower rating)
  • Free Parking on Premises

Play a contributing role in a higher rating:

  • Safety Card (20%)
  • Hair dryer
  • Carbon Monoxide detecter
  • Smoke Detector( incidences of AirBnBs without Smole dector were far lower than those with a lower rating)
  • Fire extinguisher
  • Oets living on property
  • Google host verified

Play a negliable role in a higher rating:

  • Washer / Dryer
  • 24 hour check in
  • Buzzer / Intercom
  • Dogs
  • Laptop Friendly Workspace
  • Family/ kid frendly
  • Cable TV
  • Hangers
  • Iron
  • Fireplace Indoor
  • Host verified facebook
  • Host verification_Kba

Following amentities were found in all categories:

  • Dryer
  • Kitchen
  • Washer
  • Heating
  • Host verifications phone
  • Host verifications email

host characteristicsΒΆ

InΒ [Β ]:
## Creating a separate dataframe for features with messy graphs.
analylis2 = df_analysis.copy(deep=True)
analylis2a = analylis2[[   'host_location',
    'host_neighbourhood',
    'host_response_rate',
    'price',
    'weekly_price',
    'cleaning_fee',
    'extra_people',
    'maximum_nights',
    'availability_30',
    'availability_60',
    'availability_90',
    'availability_365'
]]

analylis2a_cols = analylis2a.columns

for cols in analylis2a_cols:
    fig, axs = plt.subplots(1, 3, figsize=(16, 4.5))

    # high ratings
    high_data = (((high_review_scores.groupby(cols)['review_scores_rating'].count()) / high_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    high_colors = sns.color_palette("crest", len(high_data))
    high_data.plot.bar(ax=axs[0], color=high_colors)
    axs[0].set_title('high ratings (β‰₯90)', pad=10)
    axs[0].set_ylabel('percentage', labelpad=10)

    # mid ratings
    mid_data = (((mid_review_scores.groupby(cols)['review_scores_rating'].count()) / mid_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    mid_colors = sns.color_palette("flare", len(mid_data))
    mid_data.plot.bar(ax=axs[1], color=mid_colors)
    axs[1].set_title('mid ratings (70-89)', pad=10)
    axs[1].set_ylabel('')

    # low ratings
    low_data = (((low_review_scores.groupby(cols)['review_scores_rating'].count()) / low_review_scores.shape[0]) * 100).sort_values(ascending=False)[:20]
    low_colors = sns.color_palette("rocket", len(low_data))
    low_data.plot.bar(ax=axs[2], color=low_colors)
    axs[2].set_title('low ratings (<70)', pad=10)
    axs[2].set_ylabel('')

    plt.show()
    print('---------------------------------------------------------------------------------------------------------------------')
    print()
No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

No description has been provided for this image
---------------------------------------------------------------------------------------------------------------------

Evaluation of Host CharactersticsΒΆ

host_neighbourhood: The most successful hosts were found in capital hill and Beltown. While the hosts with lower ratings were found(with a higher margin) in the Ravenna neighbourhood district. Further analysis may need to be done or other datasets to find why this trend has been observerd within these particular neighbourhoods.

host_response_rate: The response rate of the highest rating airBnBs was 70% for 100% of responses in comparisionto ~50% and ~37% for the remaining airBnBs.

This reveals an important characterstics of hosts with the highest rated airBnBs which is that they are very responsive.

extra_people:

50% The most popular airBnBs didn't charge for extra people which the remaing charged less than 10$. In comaprision to the other airBnBs which charged a higher slightly higher rate.

avaliablity(30,60,90, 365 ):

Accoridng to the airBnB website, avalaiblity refers to

"Your availability is when and how often you offer your experience, and it impacts how easily potential guests find your experience on Airbnb. Those settings are of course entirely up to you, and we hope the following tips on availability will provide some inspiration to help you reach your hosting goals."ΒΆ

Therefore it is understood that a host ought to have greater avaliablity slots so that his airBnB becomes popular. The results reinfornce this assumption as the number of airBnB that are unavaliable(i.e 0 days) for 30, 60, 90 and 365 days are lesser than that of airBnBs with average and lower ratings.

Therefore the less avaliable slots a host offers, the less likely it is for him to get a good rating.

3. How do the prices and the number of bookings fluctuate with regards to time. What are the trends observed within the airBnBs of the three different categories.ΒΆ

No of booking availablities' variation with timeΒΆ

InΒ [Β ]:
df_calendar = pd.read_csv('calendar.csv')
df_calendar['date'] = pd.to_datetime(df_calendar['date'])
InΒ [Β ]:
#isolating and extracting the available bookings
df_calendar_available = df_calendar[df_calendar['available'] == 't']
df_cal_a = df_calendar_available.groupby('date').count()
df_cal_a.plot.line(figsize=(21,5))
Out[Β ]:
<Axes: xlabel='date'>
No description has been provided for this image

The time series shows that most of the listings were avaliable towards the end of March. They started increasing from Januwary onwards to March and then experienced a dip in April and remained constant untill end of June after which the avaliablity further decreased in July and which it contunued to gradually increase till January.

The peak time of bookings therefore is in the month of March and the lowest is in the month of January. However the pontential host should considering doing a comparitive analysis on another dataset during the previous years to ensure that this pattern is consistant with the previous years and therefore reliable.

Price variation with timeΒΆ

InΒ [Β ]:
df_calendar_price = df_calendar.copy(deep=True)
df_calendar_price.dropna(inplace=True)
df_calendar_price['price'] = df_calendar_price.price.replace('[\$,]', '', regex=True).astype(float)
InΒ [Β ]:
df_cal_b = df_calendar_price.groupby('date')[df_calendar_price.select_dtypes(include='number').columns].mean()
#df_cal_b.drop(columns='listing_id').plot(figsize=(21,8))
InΒ [Β ]:
df_calendar_price_high = df_calendar_price[df_calendar_price.listing_id.isin(high_review_scores.id)]
df_calendar_price_high = df_calendar_price_high[['date', 'price']]
df_calendar_price_high.dropna(inplace=True)
df_calendar_price_high['price'] = df_calendar_price_high.price.replace('[\$,]', '', regex=True).astype(float)
df_calendar_price_high = df_calendar_price_high.groupby('date')[df_calendar_price_high.select_dtypes(include='number').columns].mean()
# df_calendar_price_high.plot.line(figsize=(21,5))
# df_calendar_price_high.head(1)
InΒ [Β ]:
df_calendar_price_mid = df_calendar_price[df_calendar_price.listing_id.isin(mid_review_scores.id)]
df_calendar_price_mid = df_calendar_price_mid[['date', 'price']]
df_calendar_price_mid.dropna(inplace=True)
df_calendar_price_mid['price'] = df_calendar_price_mid.price.replace('[\$,]', '', regex=True).astype(float)
df_calendar_price_mid = df_calendar_price_mid.groupby('date')[df_calendar_price_mid.select_dtypes(include='number').columns].mean()
# df_calendar_price_mid.plot.line(figsize=(21,5))
# df_calendar_price_mid.head(1)
InΒ [Β ]:
df_calendar_price_low = df_calendar_price[df_calendar_price.listing_id.isin(low_review_scores.id)]
df_calendar_price_low = df_calendar_price_low[['date', 'price']]
df_calendar_price_low.dropna(inplace=True)
df_calendar_price_low['price'] = df_calendar_price_low.price.replace('[\$,]', '', regex=True).astype(float)
df_calendar_price_low = df_calendar_price_low.groupby('date')[df_calendar_price_low.select_dtypes(include='number').columns].mean()
# df_calendar_price_low.plot.line(figsize=(21,5))
# df_calendar_price_low.head(1)
InΒ [Β ]:
price_var_time = pd.merge(pd.merge(df_calendar_price_high, df_calendar_price_mid, on='date'), df_calendar_price_low, on='date')
price_var_time.rename(columns={"price_x": "high", "price_y": "mid", "price": "low"}, inplace=True)
InΒ [Β ]:
price_var_time.plot(figsize=(20, 8))
Out[Β ]:
<Axes: xlabel='date'>
No description has been provided for this image

Evaluation of how price and bookings fluctuate with time:ΒΆ

The above graph shows the pattern of the changes in price for each category over the year.

The results show that for the airbnbs with the lowest ratings, the prices change rapidly and then stay constant, they are also on average higher than that of the airbnbs with average rating. It is notable that the prices of these airbnbs are the highest in the beginning of the year when compared to the other two categories and the lowest in the 2nd half of the year.

The airbnbs with the highest rating show a steady and gradual increase as the year begins till the months of June and July, after which they slowly start decreasing. The prices fluctuate and change gradually with very little rapid changes.

The airbnbs with the average rating follow a similar trend as of the the airbnbs with the higher ratings however on average the price range is lower than that of the higher rated airbnbs

The results show that it is best if:

  1. The prices change gradually with little fluctuations.
  2. The ideal time to earn the highest revenue is during the months of May - July while the beginning of the year has the lowest prices.
  3. In order to stay competitive as well as earn profit, is it imperitive that the above mentioned trends/variations in prices are kept in mind. It would ensure that the airbnbs remain competitive providing a lucrative offer to prospective host.